DEBUG: instrument http2 pipe timeout on darwin (do not merge)#32002
DEBUG: instrument http2 pipe timeout on darwin (do not merge)#32002cirospaciari wants to merge 62 commits into
Conversation
Bump the compat target from Node 24.3.0 to 26.3.0: - nodejs-headers 26.3.0, NODE_MODULE_VERSION 147, bootstrap/flake pins - process.versions.v8 = 14.6.202.34-node.20, modules = 147 - V8 shim updated for 14.6: Isolate roots layout, flattened FunctionCallbackInfo exit frame, String Write/WriteOneByte/WriteUtf8 V2 + Utf8LengthV2 (legacy exports kept), External::New pointer-tag overload, Number::NewFromInt32/NewFromUint32, HandleScope::Extend/DeleteExtensions - napi_get_value_string_* no longer forms a slice from a null buffer pointer when only querying the encoded length - v8/napi test fixtures migrated to the V2 string APIs
Per https://streams.spec.whatwg.org/#readable-stream-cancel step 5, a BYOB reader's pending read requests must be closed with {value: undefined, done: true} when the stream is cancelled. readableStreamClose only settles default-reader read requests, so a pending byob read() hung forever after reader.cancel().
Port the observable behavior changes between Node v24.3.0 and v26.3.0:
streams:
- read() with no size returns one buffered chunk at a time in paused mode
- pause()/resume() are no-ops on destroyed streams; compose() moved to the
prototype and returns the composed Duplex directly
- Duplex.from(async generator) destroy aborts with the error and unblocks a
parked iterator; duplexPair destroy propagates to the other side
- eos()/finished(): conditional AsyncLocalStorage binding, immediate-result
precompute for already-settled streams, real errors override AbortError in
pipeline()
- webstreams adapters: Writable.toWeb no longer hangs on synchronous drain,
fromWeb writev failures error the stream instead of an unhandled rejection,
Readable.toWeb supports type 'bytes' (BYOB), Duplex.toWeb readableType
(DEP0201 for the old alias), ArrayBuffer/SharedArrayBuffer chunk handling,
brotli decoder errors surface as TypeError with the original code
- write(string, 'buffer') throws ERR_UNKNOWN_ENCODING
http:
- ServerResponse.writeHeader removed (DEP0063 end-of-life)
- upgrade requests with no 'upgrade' listener no longer vanish
- getHeader('set-cookie') returns undefined when absent, and
setHeader('set-cookie', []) round-trips as a present-but-empty array
- remove dead half-ported proxy symbols in Agent, https.Agent no longer
shadows createConnection, drain is only emitted when needed
- http2: respond() rejects raw-header arrays instead of sending garbage
frames, request() on closed sessions uses the right error codes, option
range errors use the options. prefix, initialWindowSize is capped
Vendored the matching upstream tests and updated the v24-era ones whose
expectations changed.
… images] The previous commit's message described the full compat bump but the tracked file changes were lost to a stray hard reset before committing — only the new header made it in. This restores them: - nodejs-headers 26.3.0 / NODE_MODULE_VERSION 147, bootstrap + flake pins, process.versions.v8 = 14.6.202.34-node.20 - V8 shim for 14.6: Isolate roots layout, flattened FunctionCallbackInfo exit frame, String Write/WriteOneByte/WriteUtf8 V2 + Utf8LengthV2 (legacy exports kept), External::New pointer-tag overload, Number::NewFromInt32/NewFromUint32, HandleScope::Extend/DeleteExtensions; export lists updated for all platforms - napi_get_value_string_* no longer forms a slice from a null buffer pointer when only querying the encoded length - v8/napi test fixtures migrated to the V2 string APIs On top of the restoration, review fixes to the shim: - DeleteExtensions now actually reclaims the slots Extend granted inside a closing scope (per-iteration v8::HandleScopes in a long native call no longer accumulate handles until the callback returns), with a debug assert against out-of-LIFO scope destruction - WriteUtf8V2 encodes UTF-16 in chunks so the 32-bit counts from the encoder can't wrap for >4GB outputs, and uses view() instead of flattening ropes - Utf8LengthV2 skips the scalar surrogate scan for valid UTF-16 (SIMD validate first); WriteV2/WriteOneByteV2 use vectorized copies - FunctionCallbackInfo exit frame stays on the stack for up to 16 arguments - version strings come from a single source of truth: REPORTED_NODEJS_V8_VERSION is defined from scripts/build/deps/nodejs-headers.ts and used by both process-report objects, alongside the existing REPORTED_NODEJS_ABI_VERSION
- _http_outgoing: drop the lazy outputData prototype accessor — reading it with `this` set to the prototype (spread, Object.assign, inspection) defineProperty'd a single shared array onto the prototype, reinstating the cross-instance leak it was meant to prevent, and reads on frozen instances threw. Like Node, the prototype now has no outputData property; methods lazily init for subclasses that don't chain the constructor. - _http_outgoing: getRawHeaderNames() now reports a present-but-empty set-cookie under its original casing (kEmptySetCookie stores the name); setHeader skips the per-call toLowerCase for names that can't be set-cookie. - ERR_HTTP2_GOAWAY_SESSION moved into the ErrorCode table instead of a hand-rolled Error factory. Appended at the end of the list: the table's discriminants are index-aligned with the checked-in Rust mirror (src/jsc/ErrorCode.rs). Also declared ErrorCode.ts as an input of the bundle-modules codegen step — error indices are baked into the bundled JS by replacements.ts, and an ErrorCode.ts edit previously left stale numbers in the bundles (errors thrown from builtins came out as the wrong code). - CompressionStream/DecompressionStream share one buffer-source validator via newBufferSourceTransformPairFromDuplex instead of two verbatim copies. - primordials: SafePromiseAllReturnVoid/ReturnArrayLike share one scheduler. - end-of-stream: deduplicate the immediate-result callback invocation. - node-stream.test.js: remove a byte-identical duplicate of the "node v26 stream semantics" describe block. - flake.nix: fix stale Node 24 comment.
…tput [build images]
xwin's splat phase moves unpacked SDK files into place with rename(2). With
the cache under the download dir on tmpfs and /opt/winsysroot on the root
filesystem, every move fails with EXDEV ("Cross-device link"), killing the
linux image bakes. Cache next to the output instead and clean it up after.
The previous head commit came from the formatting bot, whose message lacks the [build images] tag — the pipeline then asked for published v35 images that don't exist yet.
…ild images] The private S3 mirror has no v26.3.0 musl tarballs, which 403'd the Alpine image bakes. nodejs/unofficial-builds publishes both linux-x64-musl and linux-arm64-musl for current releases (the mirror predates arm64-musl being available there), so pull straight from it and skip the manual re-upload on every Node bump.
…ild images] - process.config.variables: add enable_thin_lto=false and lto_jobs="" to match Node 26. Its common.gypi evaluates these in MSVS settings conditions and gyp hard-fails on undefined variables, breaking every node-gyp build on Windows (v8/napi/dlopen suites). - The cipher streaming tests assumed read() with no size returns the whole buffered ciphertext; since Node 26 it returns one chunk at a time, so the fixtures truncated the ciphertext and OpenSSL reported BAD_DECRYPT — real Node 26 fails the old fixtures identically. Drain read() in a loop and add the second cipher data event Node 26 emits (output verified against Node 26.3.0 byte-for-byte modulo quote style); drop the stale TODO claiming the readable/prefinish order was a bug.
…tions [build images] Deliberate divergence from Node 26 (nodejs/node#62557 made pause()/resume() early-return on destroyed streams). Legacy Readable subclasses like fd-slicer assign `this.destroyed = true` — which hits the prototype setter on modern streams — right before push(null). With the upstream guard, a piped destination's 'drain' can no longer resume the source, so the last buffered chunk is silently dropped and the pipeline never finishes. In practice that breaks yauzl → extract-zip → puppeteer browser installs and other zip/tar tooling, and the same hang reproduces on Node 26.3.0 itself. Keep the Node 24 behavior of letting such streams flush their buffer, and replace the no-op assertion test with a regression test for the fd-slicer pattern (verified: extract-zip now extracts a full chrome-headless-shell archive instead of stopping after the first entries).
…ed [build images]
The dev-server-puppeteer launcher already prefers a system Chromium when one
is installed (CI bootstraps one on every Linux flavor), and several CI
platforms have no Chrome for Testing build at all (linux-arm64,
windows-arm64). The unconditional download during bun install wasted ~150MB
per run and, worse, a half-finished download left in the shared agent cache
by an earlier failed run makes @puppeteer/browsers refuse every later
install ("browser folder exists but the executable is missing"). Set
PUPPETEER_SKIP_DOWNLOAD for the install step when a system browser exists or
the platform can't run the downloaded one.
…, goaway semantics [build images] - v8: reserve the EscapableHandleScope escape slot at construction (like real V8) instead of allocating it inside Escape(). Allocated at Escape() time it sits above any in-scope inline handle grants, so the inline ~HandleScope's DeleteExtensions swept the just-escaped handle together with the scope. The reservation lives in a side registry keyed by the scope's address because the V8 ABI leaves exactly one usable word in the object; entries are purged when their buffer clears, and a reused stack address overwrites its stale entry. New fixture test creates inline Local copies inside the scope before escaping. Also: createRawHandleSlot now does its two bookkeeping steps under one lock. - _http_outgoing: the headers getter/setter, appendHeader and getRawHeaderNames now agree with the kEmptySetCookie marker — the getter mirrors getHeaders(), replacing the header bag or appending a cookie drops the marker, and getRawHeaderNames can't report set-cookie twice. - FetchHeaders.getRawKeys: the HTTPHeaderMap iterator only walks the common and uncommon segments while size() also counts set-cookie values, so the returned array had trailing holes whenever set-cookie was present; size for unique names and append "Set-Cookie" explicitly. - http2: receiving GOAWAY now mirrors Node — NGHTTP2_NO_ERROR begins a graceful close() (request() then throws ERR_HTTP2_GOAWAY_SESSION), any other code destroys the session with ERR_HTTP2_SESSION_ERROR. The rejected-stream test asserted the old behavior; its expectation now matches what a real Node 26 client reports for the same goaway. - tests: hoist the _http_common require to module scope, drop a redundant in-test require, probe the prototype's outputData directly instead of spreading the prototype (which invoked unrelated deprecated getters), and share the puppeteer skip-download env via a harness helper.
…uest [build images] Two fallouts from mirroring Node's GOAWAY handling surfaced by CI: - After a graceful close(), the socket dying left active streams hanging forever: #onClose ran stream.close(NGHTTP2_CANCEL) and then detached the parser, so the writable side could never finish and 'close' never fired (test-http2-server-shutdown-options-errors timed out). Mirror Node's closeSession: hard-destroy every stream that survives the cancel pass, on both client and server sessions. - A request created with an already-aborted signal went through the native parser, which RST'd a stream the peer never saw — a connection error that makes conforming servers reply GOAWAY(INTERNAL_ERROR). The old code swallowed that session error; now that goaway destroys the session with ERR_HTTP2_SESSION_ERROR like Node, the noise became visible. Like Node, never touch the wire: create an id-less stream and destroy it with an AbortError on the next tick (the stream reports aborted=true and rstCode CANCEL but does not emit 'aborted' since the request never started). Also for CI on Windows: pass -Denable_lto=false -Denable_thin_lto=false -Dlto_jobs= to node-gyp in the v8/napi/dlopen test builds. A clang-cl-built Node 26 reports thin-LTO build flags in process.config, node-gyp copies them into addon builds, and MSVC's link.exe hard-fails on /opt:lldltojobs. The defines are no-ops elsewhere. And when puppeteer's browser download can't be skipped, install into a fresh per-run PUPPETEER_CACHE_DIR (forwarded to the launcher) so a half-extracted download left in the shared agent cache by an earlier failed run can't block the install.
The exported ~HandleScope popped the Bun scope stack and cleared its buffer but left the isolate's HandleScopeData alone. If Extend had granted slots from that buffer while the scope was current (an addon's inline Local::New inside e.g. an Array::Iterate callback), next/limit kept pointing into the cleared buffer; the next inline v8::HandleScope then snapshotted that stale limit as prev_limit_, and its DeleteExtensions ran deleteGrantsBack against the enclosing buffer where no grant matches a foreign address — popping every grant, including handles of still-open outer scopes. Snapshot next/limit at scope push (stored in the scope's buffer cell — the 3-word V8 ABI leaves no room in the frame itself) and write them back on pop. New fixture test drives exactly this through Array::Iterate. Also gate the duckdb smoke test on NODE_MODULE_VERSION <= 137: the deprecated duckdb package publishes no prebuilts past ABI 137 (checked npm.duckdb.org for 1.3.1-1.4.1), and node-pre-gyp's fallback source build is not viable in CI. The gate can go away if the test migrates to the N-API based @duckdb/node-api.
Matches node's ordering and semantics (verified against v26.3.0):
validateAbortSignal accepts any object with an 'aborted' property, so a
duck-typed { aborted: true } takes the abort fast path — but objects
without 'aborted' and non-objects throw ERR_INVALID_ARG_TYPE
synchronously, before the fast path can read .aborted.
Azure repeatedly failed to allocate Standard_D4ds_v6 in-region (AllocationFailed) during image bakes; D4as_v7 is one of the alternatives Azure's allocation guidance suggests. Build-only VM — CI runner sizes are unchanged in ci.mjs.
A build without the [build images] tag fell back to published v35 images, and stale v35 alpine images exist from before the Node 26.3.0 install landed (bootstrap already said Version: 35 while still installing 24.3.0) — agents came up with Node 24, failing the version check and building ABI-137 addons that an ABI-147 runtime rightly refuses to load. Move to v36/v21 so the poisoned tags can never be selected again.
…r bun [build images] - Http2Session.close() (client and server) now sends GOAWAY immediately like Node (core.js marks the session closed and calls goaway() right away), so peers stop routing new work to a draining session; the client goaway() gained Node's default arguments. The grpc-js unbind test additionally accepts CANCELLED: a call that wins the race onto the draining session before the GOAWAY/socket-close is processed is torn down with NGHTTP2_CANCEL — the same in-flight teardown Node performs at socket close; which of the three statuses you get is pick-timing. - Run every test-suite node-gyp build under bun (--bun): the gyp -D defines could not neutralize the thin-LTO flags a clang-cl-built Node 26 carries in process.config.target_defaults (node-gyp copies them into config.gypi and MSVC's link.exe fails on /opt:lldltojobs), and the system Node's ABI may not match ours at all. Bun reports the same ABI (147) with clean target_defaults, so the produced modules load in node 26 unchanged. The dlopen fixtures invoke the bun under test explicitly and their beforeAll hooks get a real timeout — the builds legitimately exceed the 5s default under a debug/ASAN binary.
Same rationale as the previous commit's dlopen/napi change — this invocation was missed: gyp -D defines can't override the thin-LTO flags a clang-cl-built Node carries in process.config.target_defaults, so build the node-side module with bun (--bun) as well; it targets the same ABI (147) and loads in node 26 unchanged. Also drop the now-redundant defines from the bun-side invocation.
…ness [build images] common.gypi only applies CLANG_CXX_LANGUAGE_STANDARD (gnu++20) to addon builds when the clang variable is 1, which real Node reports on macOS. With bun running node-gyp and reporting clang=0, Apple clang compiled addons at its ancient default standard and every node-gyp build on darwin failed on <type_traits> (std::is_enum_v) in the Node 26 headers. The v8 harness now includes the child's exit code in crash messages — the Windows fixture crashes currently report nothing, and the code distinguishes an access violation from a DLL load failure.
sharp has no win32-arm64 prebuilt, so its install script falls back to a node-gyp source build under the system Node — which the clang-cl-built Node 26 breaks (process.config leaks thin-LTO flags that MSVC's link.exe rejects with LNK1117). The test exercises package-lock.json migration, not lifecycle scripts, so pass --ignore-scripts there.
…[build images] The puppeteer-based next-pages tests still hit flaky per-run Chrome for Testing downloads on ubuntu/debian x64 (the only CI platforms with neither a system browser nor a skip gate). Bake google-chrome-stable into those images and teach the browser lookups about its binary names — with a system browser present the tests skip the ~300MB download entirely and launch the system binary, like they already do on alpine.
These were in symbols.txt/.dyn (so Linux and macOS resolved them) but never in the Windows .def. Node addons delay-load the host's exports, so the gap only surfaces at the first call: the v8 fixture tests that exercise these type checks died with the delay-load "procedure not found" SEH status (0xC06D007F, reported as exit code 127) after printing their first lines. Manglings verified against clang-cl and the remaining 83 v8 exports audited for further gaps (none).
…[build images] The exit-127 crashes on windows-aarch64 print nothing: stdout shows the test started and stderr is empty. The exe exports every referenced v8 symbol (checked both pre- and post-61197 artifacts against the addon's full import set), so the delay-load theory needs narrowing. Markers go to stderr, which checkSameOutput never compares but does include in crash messages, so the next CI run names the exact call that dies.
Root cause of the Windows --debug fixture crashes (exit 127), found by reproducing under wine with the CI binary and a clang-cl-built debug addon: V8_INLINE members whose bodies live at namespace scope in the headers — HandleScope::CreateHandle(v8::Isolate*, Address), HandleScope::Initialize, and Value::QuickIsUndefined/QuickIsNull/QuickIsNullOrUndefined/QuickIsString — are imported rather than emitted by MSVC debug builds of a dllimport class. Real Node exports its entire V8 surface so this works there; Bun's curated export list didn't include them, and the delay-load helper's "procedure not found" SEH status (0xC06D007F) surfaces as exit code 127 at the first call. That also explains the seemingly random crash points across tests: each died at its own first imported-inline call. Implementations are thin: the CreateHandle overload forwards to the internal::Isolate* variant (same object), Initialize performs V8's inline frame setup (snapshot next/limit, level++ — never pushing a Bun scope, like EscapableHandleScopeBase), and the QuickIs* checks are the corresponding JSC value checks. Exported on all platforms for symmetry; ELF builds emit inline copies locally so only Windows strictly needs them. Verified under wine: with the missing import resolved, test_v8_global, test_v8_escapable_handle_scope_inline_grants and test_v8_locals_survive_nested_call all pass with a debug-CRT fixture.
…d images]
A graceful close() sends GOAWAY through the JS default lastStreamId = 0, and
the native parser only fell back to the session's real last stream id for
undefined/null — a literal 0 went on the wire, telling the peer per RFC 9113
§6.8 that no streams were processed and everything is retriable. Node's JS
layer has the same 0 default and relies on its native correction
(lastStreamID <= 0 → nghttp2_session_get_last_proc_stream_id); mirror that
in the parser so every caller (close(), destroy(), user goaway) is covered.
Also: the server session's goaway handler now matches the client's (and
Node's onGoawayData) — NO_ERROR begins a graceful close, any other code
destroys with ERR_HTTP2_SESSION_ERROR; corrected a comment that claimed
duck-typed { aborted } signals throw (validateAbortSignal accepts any object
with an 'aborted' property); and stripped the v8 fixture's temporary stderr
step markers now that the Windows exit-127 is root-caused.
A server session that received a graceful GOAWAY marks itself closed and defers destruction to the moment its last stream closes. If the peer's socket disappears before that stream-close arrives (the client sends GOAWAY and ends the socket in the same tick, and on macOS the frame and the EOF are processed in one read batch), that moment never comes: the socket-close handler only called close(), whose closed/destroyed early-return now makes it a no-op, so the session — and the server's open-connection count — stayed alive forever and server.close() never completed. Node's socketOnClose unconditionally tears the session down (close() + closeSession()); do the same by following close() with destroy(), mirroring what the client session's #onClose already does. Fixes the test-http2-compat-serverrequest-pipe.js timeout on the darwin CI fleet (diagnosed with an instrumented build: all request/response events completed, server.close() never fired).
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/js/node/test/parallel/test-http2-compat-serverrequest-pipe.js (1)
12-120:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove debug-only instrumentation from this upstream-sync Node test before merge.
This adds non-upstream watchdog/event-trail behavior (including forced
process.exit(7)), which breaks verbatim-sync expectations fortest/js/node/test/parallel/*and can change test behavior independent of runtime correctness. Please revert this file to upstream content and keep diagnostics in a non-sync/debug-only path.Based on learnings: files under
test/js/node/test/parallel/should remain byte-identical to upstream Node tests, and local debug edits in those files should not be landed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/js/node/test/parallel/test-http2-compat-serverrequest-pipe.js` around lines 12 - 120, The test has debug-only instrumentation (t0, events, ev(), state object, watchdog timer with process.exit, dump(), and extra ev(...) calls) that must be removed — revert the file to the upstream version: delete the watchdog block (watchdog, watchdog.unref(), dump and process.exit(7)), remove the events array/t0/ev() helpers and the state object usage, and undo added logging hooks that reference state/serverSock/serverSession/serverReq/serverRes/serverDest/clientReq/clientStr/clientSession; restore original test behavior and event handlers so the file is byte-identical to upstream.Source: Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@test/js/node/test/parallel/test-http2-compat-serverrequest-pipe.js`:
- Around line 12-120: The test has debug-only instrumentation (t0, events, ev(),
state object, watchdog timer with process.exit, dump(), and extra ev(...) calls)
that must be removed — revert the file to the upstream version: delete the
watchdog block (watchdog, watchdog.unref(), dump and process.exit(7)), remove
the events array/t0/ev() helpers and the state object usage, and undo added
logging hooks that reference
state/serverSock/serverSession/serverReq/serverRes/serverDest/clientReq/clientStr/clientSession;
restore original test behavior and event handlers so the file is byte-identical
to upstream.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8122a21e-75bc-4f23-878c-812f021ac8c7
📒 Files selected for processing (1)
test/js/node/test/parallel/test-http2-compat-serverrequest-pipe.js
b4a7867 to
63a86ef
Compare
|
Diagnosis complete — the root cause (headers-only END_STREAM responses never closing the h2 stream in the frame parser) is fixed in #31991 (28c2ab4) and verified on the same darwin agents that reproduced the hang. A latent macOS kqueue FIN-redelivery quirk in uSockets observed during this investigation will be filed separately. |
Throwaway diagnostic branch for the darwin-aarch64 timeout of test-http2-compat-serverrequest-pipe.js seen on #31991. Runs a single instrumented test on a darwin-only pipeline. Will be closed once the root cause is identified.